Description

This displays the resulting filled images calculated using the fill_gaps.R script.

Different parameters were tested on the following data (note there are 2 different weeks, one with good weekly coverage and one without):

Region: Northwest Atlantic (NWA, 39 to 82 N, 42 to 95 W)  
Sensor: MODIS   
Resolution: 4km   
Processing level: Level 3, binned (L3b)  
Year: 2015  
Weeks: 7, 22  
Pixels outside 0-64 mg m^-3 removed  
Days with < 5% coverage removed  

ImputeEOF removes randomly sampled valid pixels for cross-validation. The number of pixels used is the maximum of 30, or 10% of the pixels. The function continues adding EOFs and calculating the resulting RMSE between real and reconstructed cross-validation pixels until the difference between the current RMSE and RMSE of the previous iteration is below a certain threshold (i.e. adding the most recent EOF did not significantly improve the RMSE). The threshold, called the “tolerance”, is different depending on whether you’re filling data in linear space or in log space, since a log RMSE will be only a fraction of the size of a linear RMSE:

Tolerance for filling logged data: 0.001
Tolerance for filling linear data: 0.01

We start by using a year of data to fill the gaps, and compare different methods below. Then, using the best options, we’ll try using a longer time series.

For each method of filling gaps, we’ll examine the following:

The linear regression uses the standard major axis method (SMA) from lmodel2::lmodel2(), since it minimizes the area of the triangle instead of the distance in the x or y direction alone (i.e. it assumes there is error in both the independent and dependent variables, the “real” and filled/reconstructed data).

Also note that for the tests that involve filling an 8day composite, in situ matchups should be interpreted with caution because of the long temporal bin and the changes that could occur in concentrations and patterns within that time span.

An analysis of DINEOF on the Canadian Pacific coast:
Hilborn A, Costa M. Applications of DINEOF to Satellite-Derived Chlorophyll-a from a Productive Coastal Region. Remote Sensing. 2018; 10(9):1449. https://doi.org/10.3390/rs10091449

8day vs daily

Chla algorithm: OCx
Logged/linear data: Logged

Which is better - filling the gaps in 8day data, or filling gaps in daily data and then averaging it into an 8day image?

Although some R^2 metrics appear better for the daily filled version, overall the 8day cross-validation data has a better fit and less bias (e.g. it identifies some patterns of higher concentration better than the daily fill).

8day

Number of EOF: 5 
 Total RMSE: 0.2224261 
 Week 7 RMSE: 0.287156 
 Week 22 RMSE: 0.1992027

Daily

Number of EOF: 11 
 Total RMSE: 0.2062114 
 Week 7 RMSE: 0.372335 
 Week 22 RMSE: 0.1860863

OCx vs POLY4

Temporal binning: 8day
Logged/linear data: Logged

Should the OCx or POLY4 algorithm be used? Note that POLY4 has shown to remove some of the bias in the NWA.
OCx = global band-ratio
POLY4 = regional band-ratio, tuned to NWA

The POLY4 algorithm does appear to remove some of the bias and improve the validity of the reconstructed values.

OCx

Number of EOF: 5 
 Total RMSE: 0.2224261 
 Week 7 RMSE: 0.287156 
 Week 22 RMSE: 0.1992027

POLY4

Number of EOF: 6 
 Total RMSE: 0.257584 
 Week 7 RMSE: 0.3356319 
 Week 22 RMSE: 0.2588712

Log vs linear

Temporal binning: 8day
Chla algorithm: POLY4

Should we use logged data or linear data to fill the gaps?
Note the process for the log option:

Logged data gives a smoother fill as it is not negatively impacted by isolated spikes over relatively low and consistent concentrations.

Log

Number of EOF: 6 
 Total RMSE: 0.257584 
 Week 7 RMSE: 0.3356319 
 Week 22 RMSE: 0.2588712

Linear

Number of EOF: 5 
 Total RMSE: 1.806879 
 Week 7 RMSE: 2.12901 
 Week 22 RMSE: 1.298033

Longer time series

If more satellite images are used in the algorithm, will it improve the results?

Hilborn and Costa (2018) found that pixel reconstruction improved with more data in a smaller region on the Canadian Pacific coast. Up until this point we have only used one year of data to fill the gaps, but here we’ll try adding more (an equal number of years on either side of the target year, 2015).

Note that the 3year/5year DINEOF runs use the same cross-validation pixels for 2015 with extra randomly-selected pixels from the remaining years. Also, the CV regression below is performed using only the CV pixels for 2015 to give a more accurate comparison between methods.

Overall, expanding the time series seems to give a slight improvement to the results. Based on the RMSE summary plot at the bottom, it appears as though the best results are achieved when using ~ 3 years of data to fill the gaps, after which there are only very slight improvements to weeks with good percent coverage, and the RMSE for weeks with bad percent coverage starts rising.

1 year

Number of EOF: 6 
 Total RMSE: 0.257584 
 Week 7 RMSE: 0.3356319 
 Week 22 RMSE: 0.2588712

3 years

Number of EOF: 11 
 Total RMSE: 0.2314012 
 Week 7 RMSE: 0.2961003 
 Week 22 RMSE: 0.2433103

5 years

Number of EOF: 13 
 Total RMSE: 0.2248182 
 Week 7 RMSE: 0.3098051 
 Week 22 RMSE: 0.2395364

7 years

Number of EOF: 15 
 Total RMSE: 0.2219548 
 Week 7 RMSE: 0.3148155 
 Week 22 RMSE: 0.2332346

Summary

Number of EOFs for 1/3/5/7 years: 6/11/13/15

Larger area

Here we’ll try adjusting the region used to fill the data.